Using the Multi Stream Approach for Continuous Audio Visual Speech Recognition Experiments on the M Vts Database
نویسنده
چکیده
The Multi Stream automatic speech recognition approach was investigated in this work as a framework for Au dio Visual data fusion and speech recognition This method presents many potential advantages for such a task It particularly allows for synchronous decoding of continuous speech while still allowing for some asynchrony of the visual and acoustic information streams First the Multi Stream formalism is brie y recalled Then on top of the Multi Stream motivations experiments on the M VTSmultimodal database are presented and discussed To our knowledge these are the rst experiments address ing multi speaker continuous Audio Visual Speech Recog nition AVSR It is shown that the Multi Stream approach can yield improved Audio Visual speech recognition per formance when the acoustic signal is corrupted by noise as well as for clean speech
منابع مشابه
Efficient likelihood computation in multi-stream HMM based audio-visual speech recognition
Multi-stream hidden Markov models have recently been introduced in the field of automatic speech recognition as an alternative to single-stream modeling of sequences of speech informative features. In particular, they have been very successful in audio-visual speech recognition, where features extracted from video of the speaker’s lips are also available. However, in contrast to single-stream m...
متن کاملProduct HMMs for audio-visual continuous speech recognition using facial animation parameters
The use of visual information in addition to acoustic can improve automatic speech recognition. In this paper we compare different approaches for audio-visual information integration and show how they affect automatic speech recognition performance. We utilize Facial Animation Parameters (FAPs), supported by the MPEG-4 standard for the visual representation as visual features. We use both Singl...
متن کاملDiscriminatively trained features using fMPE for multi-stream audio-visual speech recognition
fMPE is a recently introduced discriminative training technique that uses the Minimum Phone Error (MPE) discriminative criterion to train a feature-level transformation. In this paper we investigate fMPE trained audio/visual features for multistream HMM-based audio-visual speech recognition. A flexible, layer-based implementation of fMPE allows us to combine the the visual information with the ...
متن کاملStream weight estimation using higher order statistics in multi-modal speech recognition
In this paper, stream weight optimization for multi-modal speech recognition using audio information and visual information is examined. In a conventional multi-stream Hidden Markov Model (HMM) used in multi-modal speech recognition, a constraint in which the summation of audio and visual weight factors should be one is employed. This means balance between transition and observation probabiliti...
متن کاملFused HMM-adaptation of multi-stream HMMs for audio-visual speech recognition
A technique known as fused hidden Markov models (FHMMs) was recently proposed as an alternative multi-stream modelling technique for audio-visual speaker recognition. In this paper we show that for audio-visual speech recognition (AVSR), FHMMs can be adopted as a novel method of training synchronous MSHMMs. MSHMMs, as proposed by several authors for use in AVSR, are jointly trained on both the ...
متن کامل